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Abstract 

For a graph G on n vertices, let Z{G, A) be the partition function of the monomer-dimer system 
defined by: Z{G,X) = ^j.mfc(G)A'^, where mk{G) is the number of matchings of cardinality 
k in G. We develop a constant-time algorithm for approximating log Z{G, A) at an arbitrary 
point A > with additive error en. In the bounded degree model, the query complexity of our 
algorithm is polynomial in 1/e, and we provide a lower bound quadratic in 1/e for this problem. 
This is the first analysis of a sublinear-time algorithm for a ^P-complete problem. Our approach 
is based on the correlation decay of the Gibbs distribution associated with Z{G, A). We show that 
our algorithm approximates the probability for a vertex to be covered by a matching sampled 
according to this Gibbs distribution in a near-optimal sublinear-time. We extend our results 
to approximate the average size and the entropy of such a matching with an additive error in 
constant time, where again the query complexity is polynomial in 1/e and the lower bound is 
quadratic in 1/e. Our algorithms are simple to implement and of practical use when dealing 
with massive datasets. Our results extend to many other problems where the correlation decay 
is known to hold as for independent sets or the Ising model up to the critical activity. 
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pi Introduction 

The area of sublinear-time algorithms is an emerging area of computer science which has its 
root in the study of massive data sets |7J[IH]- Internet, social networks or communication 
networks are typical examples of graphs with potentially millions of vertices representing 
agents, and edges representing possible interactions among those agents. In this paper, we 
present sublinear-time algorithms for graph problems. We are concerned more with problems 
of counting and statistical inference and less with optimization. For example, in a mobile call 
graphs, phone calls can be represented as a matching of the graph where each edge has an 
activity associated to the intensity of the interactions between the pair of users. Given such a 
graphs, with local activities on edges, we would like to answer questions like: what is the size 
of a typical matching? for a given user what is the probability of being matched? As another 
example, models of statistical physics have been proposed to model social interactions. In 
particular, spin systems are a general framework for modeling nearest-neighbor interactions 
on graphs. In this setting, the activity associated to each edge allows to model a perturbed 
best- response dynamics Again in this setting, it is interesting to compute estimations for 
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the number of agents playing a given strategy or the probabiHty for an agent in the graph to 
play a given strategy at equilibrium. 

There are now quite a few results on sublinear-time algorithms for graph optimization 
problems: minimum spanning tree weight [B], minimum vertex cover, maximum matching, 
minimum set cover |17| I15| I24| I16j . Our focus in this paper is quite different as we are 
studying algorithmic problems arising in statistical physics and classical combinatorics [23] 
which are not concerned with the computation of an optimum solution for a graph problem. 
We now present the monomer-dimer problem which will be the main focus of our paper (our 
techniques apply also to other systems which will be described in Section |4]) . 

Let G = {V, E) he an undirected graph with |y| = 71 vertices and = m edges, where we 
allow G to contain parallel edges and self-loops. We denote by N{G, v) the set of neighbors of 
V in G. We consider bounded degree graphs with max^ \N{G,v)\ < A. In a monomer-dimer 
system, the vertices are covered by non-overlapping arrangement of monomers (molecules 
occupying one vertex of G) and dimers (molecules occupying two adjacent vertices in G) [lOj . 
It is convenient to identify monomer-dimer arrangements with matchings; a matching in G 
is a subset M C E such that no two edges in AI share an endpoint. Thus, a matching of 
cardinality |A/| — k corresponds exactly to a monomer-dimer arrangement with k dimers 
and n — 2k monomers. Let M be the set of matchings of G. To each matching M, a weight 
A'^I is assigned, where A > is called the activity. The partition function of the system 
is defined by Z{G, A) = X^mem '^'^^'j a-nd the Gibbs distribution on the space M is defined 
by ttg,\{M) = Y^Q-yj- The function Z{G, A) is also of combinatorial interest and called the 
matching polynomial in this context |13j . For example, Z{G, 1) enumerates all matchings 
in G. From an algorithmic viewpoint, no feasible method is known for computing Z(G, A) 
exactly for general monomer-dimes system; indeed, for any fixed value of A > 0, the problem 
of computing Z{G, A) exactly in a bounded degree graph when A > 5 is complete for the 
class of enumeration problems [21]. The focus on these problems shifted to finding 
approximating solutions in polynomial time. For the monomer-dimer problem, the Markov 
Chain Monte Carlo method yields a provably efficient algorithm finding an approximate 
solution. Based on the equivalence between the counting problem (computing Z{G, A)) and 
the sampling problem (according to t^g.x) fl^J, this approach focuses on rapidly mixing 
Markov chains to obtain appropriate random samples. A fully polynomial randomized 
approximation scheme (FPRAS) for computing the total number of matchings of a given 
graph was provided by Jerrum and Sinclair |1H I19j . 

In order to study sublinear-time algorithms for these problems, we use an alternative 
approach based on the concept of correlation decay originating in statistical physics [14j and 
which has been used to get a deterministic approximation scheme for counting matchings in [T] . 
It follows already from [TD] that the marginals of the probability distribution ncx are local 
in nature (which has later been formalised as spatial correlation decay): the local structure 
of the graph around a vertex v allows to compute an approximation of the corresponding 
marginal. Our algorithm is then simple to understand: we need only to sample a fixed 
number of vertices, approximate the marginals associated to these vertices locally and then 
from these values output an estimate for the desired quantity. This technique will also work 
for other systems as soon as the correlation decay property is known to hold as shown in 
|22j for the independent set problems or in |20j for the anti-ferromagnetic Ising model with 
arbitrary field (in both cases when the parameters are below the critical activity). We will 
discuss these apphcations in Section |4] 

A graph G is represented by two kinds of oracles Vg and Og such that I?g(i') returns 
the degree oi v G V and Og{v, i) returns the ith (with 1 < i < A) neighbor oi v . The 
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efficiency of an algorithm is measured by its query complexity, i.e. the number of accesses to 
Vg and Og- If VALg denotes a value associated with the graph G, we say that VAL is an 
e-approximation of VALg if VAL — e < VALq < VAL + e, where e (0 < e < 1/2) is specified 
as an input parameter. An algorithm is called an e-approximation algorithm for VALq if for 
any graph G, it computes an e-approximation of VALq with high constant probability (e.g., 
at least |). In our model, we will consider the case of constant A as e tends to zero, i.e., we 
always first take the limit as e — > and then the limit A — > oo. 

Our main contribution is the design of constant-time en-approximation algorithms for 
the partition function Z{G,X), the average size of a matching sampled according to ttg,x and 
the entropy of ttg.a for a general graph G with degree bound A. All these algorithms use 
O ^(1/e)'''^^^^ j^queries. Note that when A is a constant, our algorithm requires a number 
of queries polynomial in 1/e. We also show an f2(l/e^) lower bound on the query complexity 
of these problems (when A is fixed) . 

The main tool of the above algorithms is the approximation of the marginal Pg,\{v), 
which is the probability that v is not covered by a matching under the Gibbs distribution. 
We show that it is possible to estimate pg,xW) for an arbitrary vertex v € V within a 
(multiplicative) error of e > with near-optimal query complexity O ^(1/e)'''^^-'^ . 

The rest of the paper is organized as follows. In Section |2] we prove our first main result 
concerning local computations for matchings. In Section [3| we use this result to construct 
en-approximation algorithms for problems in the monomer-dimer system and we analyze 
lower bounds on their query complexity. We also give some applications of our technique for 
approximating the permanent of constant degree expander graphs and the size of a maximum 
matching (in this last case, the performance of our algorithm is outperformed by [M])- In 
Section [4j we show that our technique applies to other systems: independent sets and the 
Ising model up to the critical activity. 

We test our algorithm on large real- world networks and show that our algorithm performs 
well not only on small degree graphs but also on small average- degree graphs (Appendix [C| . 

|2 Local computations for matchings 

Recall that we defined for all A > 0, the Gibbs distribution on matchings of a graph G by: 

VM e M, ttgAM) - where Z{G, \) = J2 

Our first focus will be on the approximation of the following quantity for a vertex v € V: 
Pg.\{v) := TTG.xiv is not covered by M) = ttg ,\(Af), 

where M ^ w is a matching not covering v. 
First notice that 

Z{G\{v},X) 
PgAv) = ^(G,A) ' 



^ O is a variant of the big O notation that ignores logarithmic factors, e.g., f{n) = 0{g{n)) is shorthand 
for /(n) = O (g(n) log* (/(ra)) for some k. 
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where G\{w} is the graph obtained from G by removing the vertex v and all incident edges. 
Then we have 

Z(G,A) = Z(G\W,A) + A Z{G\{u,v},X), 

ueN(G,v) 

so that dividing by Z{G\{v}, A), we get 

PgA^) = 1 , ^ TT- (2) 

^ + >^2^ueN{G,v)PG\{v},xW 

This recursive expression for PG,xiv) is well-known and allows to compute the marginal 
PG,\iv) exactly for each v G V. We follow the approach of Godsil [9 . First, we recall the 
notion of path-tree associated with a rooted graph G: if G is any rooted graph with root ag, 
we define its path-tree Tciao) as the rooted tree whose vertex-set consists of all finite simple 
paths starting at the root oq; whose edges are the pairs {P, P'} of the form P — qq ■ ■ ■ ak, 
P' — ttQ . . . QkCik+iik > 1); whose root is the single-vertex path oq. By a finite simple path, 
we mean here a finite sequence of distinct vertices oq . . . Uk {k > 1) such that aiOj+i G E for 
all 1 < i < fc. Note that the notion of path-tree is similar to the more standard notion of 
computation tree, the main difference being that for any finite graph G, the path-tree is 
always finite (although its size might be much larger than the size of the original graph G) . 
The recursion (j2j) easily implies Pg,x{v) — Ptg(ii),a(^^) ^^nd Ptg{v),\{'^) — ^v{v), where the 
vector x(?;) = (a::„(w), u € Tq{v)) solves the recursion: 

Vm e Tg{v), x^{v) = ^ (3) 

where it; >- u if w is a child of u in Tq{v) (by convention a sum over the empty set is zero). 

Since we need only an approximation for pG,\{v), we now show that we can solve the 
recursion (jsj) only on a truncated path-tree. For any ft, > 1, let Tq(v) be the path-tree 
truncated at depth h and -x.^(v) = {x'^{v), u £ Tq(v)) be the solution of the recursion ^ 
when the path-tree is replaced by the truncated version Tq{v). Clearly x'^{v) = pc^v) for 
any h > n and the following lemma gives a quantitative estimate on how large h needs to be 
in order to get an e-approximation oi pc.xiv). 

► Lemma 1. There exists h{e, A) such that \ logXy{v) — logpc,\{v)\ < e for any h > h{e, A). 

Moreover h(e,A) — O f \/A log(l/e)l satisfies lim —= lim , ^ / ,\ — %/A. 
^ ^ V t,w ;y A^oo <=^0 log(l/e) 

Proof. Theorem 3.2 in ]^ proves that: |loga;5^(i;) - logpG,x{v)\ < (1 - )^'/^ log(l + 

AA). The lemma then follows directly by taking /i(e,A) to be the smallest h such that 

We now present the algorithmic implication of Lemma [T] We start with a simple 
remark. The exact value for h{e, A) follows from the proof of the lemma, however this value 
will not be required in what follows as shown by the following argument: the fact that 

(zi, . . . , za) ^1 + A^^-^ Zij is strictly decreasing in each variable on [0, 1] implies (by 
a simple induction) that for any fc > 0, we have 



X] 



Hence by Lemma [l] any algorithm computing x'^{v) for increasing values of h and stopping 
at the first time two consecutive outputs are such that | logXy'^^{v) — log j:^'(i;)| < e will stop 
after at most h(e,A) iterations and the last output will be an e-approximation of logpa.xiv). 
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Let vi, . . . ,Vn be an order of all vertices in V, and define the identifier of Vi to be i, 
for 1 < i < n. When this order is not given, we can generate this order using the same 
technique as in |16] . If we assign a uniformly random number ay £ [0, 1] to each vertex v 
and define the identifier of v as the rank of a„ among the n randomly generated numbers, 
the identifiers are again uniformly random. Since the only operation concerned with the 
identifiers is comparison, which corresponds to compare two randomly generated numbers, 
we do not need to generate all random numbers in advance. It is sufficient to generate a 
random number each time we visit a new vertex and save its value for the later visits. So the 
complexity to generate random numbers is bounded by the number of queries (which will be 
proved to be a constant independent of n) . 

Let deg[i] be the degree of Vi. Let e[i][j] be the identifier of the j*'' neighbor of Vi, for 
1 < i < deg[i]. The algorithm Approx-Marginal is based on the Depth-First-Search 
(DFS) on the path-tree truncated at depth h. Let £ and k be the depth and the identifier of 
the current node in the DFS. Let path be an array of the identifiers in the path from the 
root to the current node. Define Gq = Gi = G and for kg > 1, Gkg = G\{vi, . . . , Vkg-i}- 
Parameter /cq restricts the DFS on Gk^- The graphs {Gk}k>i will be useful in the next section. 



Approx-Marginal(A, e, k, ko) 


DFS{X,e,h,k,ko) 


1 a:[l] ^ DFS(A,I,l,fc,fco) 


1 


i{i=h 


2 x[2] ^ DFS(A,I,2,fc,fco) 


2 


then return I 


3 h^2 


3 


path[£] k 


4 while 1 log x [h] — log a; [/i — f ] | > e 


4 


tmp I 


5 do h ^ h + 1 


5 


for I ^ I to deg[k] 


6 ^ DFS(A, f,/i,fc,fco) 


6 


do visited ^ false 


7 return x[h] 


7 


for j ^ I to ^ 




8 


do if path[j] — e[k][i] 




9 


then visited true 




10 


if not visited and e[A:][j] > fc( 




II 


then DFS{X,£+l,h, 




12 


tmp tmp -1- A * m 




13 


return 1/tmp 



► Proposition 2. Algorithm Approx-Marginal(A, e, fc, I) outputs an e-approximation of 
log pQ x{vk) using Q(e, A) queries where Q(e,A) = O (^(I/e)'''^-'^ satisfies 

lim —= lim 5 ,\ — V^. (5) 

A^«> ^Alog A log(I/e) 

Proof. Let h be the final height of the truncated path-tree in the above algorithm. It is 
sufficient to query O (A'') nodes in total. The proposition follows by applying the bound 
/i(e, A) on h obtained by Lemma [l] 

Since PGfc„,A(i^fc) is at most I, Q(e, A) queries are also sufficient to compute an e- 
approximation of PCkg . a (^fc) • Next we will give a lower bound Q(e, A) on the query complexity 
of this approximation, where Q(e, A) satisfies the same equation which implies that 
Algorithm Approx-Marginal is optimal when the influence of e is much larger than that 
of A. The key idea of the lower bound is to show that for any deterministic approximation 
algorithm using o(Q(e, A)) queries, there always exist two instances of almost full A-ary 
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trees, such that the algorithm outputs the same value for the two instances, while their exact 
values differ by at least e. Thus the two output values cannot be both e-approximations. 
We start by studying the full A-ary tree of height h, let it be T'*. Define j/i = 1 and 
= (1 + AAy/c-i)" for k > 2. Clearly yh is equal to the value computed at the root of 
by the recursion ([s]). 

► Lemma 3. We have limfc^oo Vk — i+^i-^^xK ' '^^^^^ exists C(A) such that \yii — yh-i \ ^ 
C(A)p''"i when h oo with p = Moreover, with h{e, A) = sup{/i, \yh - yh-i \ > 

el, we have lim lim ~^ , \ — \/X. 

A^oo ^ e^o log(l/e) 

Proof. To study yk, we introduce the auxiliary sequence: fk — fk-i + for > 3 

and /i = /2 = 1 so that yk = fk/fk+i for A: > 1. Let a = (1 + Vl + 4AA)/2 and 
/3 = (1 — + 4AA)/2, we have fk = 2a-i (Q'^ ~ Z^'^) ^^'^ ^'^^^ statement of the lemma 

follows. A simple computation gives: \yk — yk-i\ ^ ° "*''^^3~^"^ (^) when k — > oo, so 

the second statement of the lemma holds with C(A) — " "*'^3~^°'^ . We have 



/Vl +4AA - 1\ ii:(A) 



log — = log , , , , 

a ^ VVl+4AA + ly' ^/\K 

where K{A) 1 as A — > oo so the last statement of the lemma follows. < 

Denote to be the set of leaves of the tree . Let e = (e,(, u e L^) be a vector 
where each component is in {0, 1}. Define T^{e) to be the tree obtained from T'^ in the 
following way: every non-leaf node in remains in T^{e), and a leaf u in remains in 
T^{e) iff. Cu — I. (as a result, we see that the recursion ^ is valid with Xu{v) = eu for 
all u e i''). Define the vector x''(e) — (xjj(e), u G T''(e)) as defined in ([s]). We denote by 
x'^{e) the value of the component of x'''(e) corresponding to the root of the tree. Simple 
monotonicity arguments show that if h is even, then yn-i = x^{0) < x^{e) < a;''(l) = yh 
and if h is odd, then yh — x'^{l) < x'^{e) < x'^{0) — yh-i- We define the vector cl''(e) by 
d>^,ie)^\xiil)-xtie)\ for ^11 ueT\ 

For a node u of depth k, we have: 

d«(e) = 



< 



< 



AE^^n4(e) 

1 + A ^tW) (1 + A ^J^(l) - A d(i(e)) 

1 + A Eroyu Vh-k) (1 + A Y.^^u Vh-k - A niin(E„^„ Vh-k. C( A) Ap'-fc-i)) 

_ ^Vh-k+l Em^M _ ^Vh-k+l ,/i / x 

1 - \yh-k+i ^HY.^^u VH-k, C(A)Ap''-^-i) - g[h-k+l)' ^^-^ 

with g{n) = 1 — Aj/„ min(Eu,^„ yn-i, C'(A)Ap"^^) > for n > 2 where we used the 
fact that (ij^(e) < C{/S.)p^~^~^ (see Lemma [s]) in the third inequality and the fact that 
yh-k+i = (1 + A J2w>-u Vh-k) in the last equality. Hence we have 

For any h' , we have Uk>h' 9{k) > 1 - Ek>h' AyfcC(A)Ap'=-2 « 1 - AgA}A . ^h'-2^ 
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Take h' to be a constant large enough so that the above term is larger than 1/2. Then 
Y{2<k<h90^) > \ Y{2<k<h'-i9{k) ■■= C, which is a constant. Thus 



^, ^ - ^, . j2 ^ = O I ( ^ ) • > ^ e„ I . (6) 




► Proposition 4. ^nj/ deterministic approximation algorithm for the marginal pq x{v) within 
an additive error e for an arbitrary vertex v in a graph with maximal degree A requires 

(min(Q(e, A), |G|)) queries where Q{e,A) — A-^^'^-''^\ In particular, Q{e,A) satisfies ([5|. 

Proof. Clearly, we need only to deal with the case Q(e, A) < \G\. Suppose A is an ap- 
proximation algorithm for the marginal pg.\{v) using o (A'') queries, where h = h{2e, A). 
Then A can at most visit M — o(A'') positions in the /i*'* level. Let e~(resp. e"*") be a 
vector in {0, l}'^ , where the M visited positions have fixed values and all other positions 
have value (resp. value 1). A cannot distinguish the two trees r''(e^) and r'*(e+). By 
Equation lei, |a;''(e+) - = O ((^)'') ■M=o ((^)'') = o(|/3/«|") = o(e). Similarly 

|x''(e-) - x''(0)| = o(e). Since |a;''(l) - x''{o)\ > 2e, we have |a;''(e-) - a;''(e+)| > e. So ^ 
fails to give an e-approximation of pG,\{v) using o (A'') queries. The proposition then holds 
by taking Q(e, A) = A^(2e,A)^ 

Notice that the trees studied above have maximal degree A -|- 1 but changing A to A -|- 1 
will not affect the statement of the proposition. < 

► Remark. As noted in the introduction, the model with Ag varying across the edges e G £' 
is of practical interest (allowing to model various intensities on edges). As soon as there 
exists Amax such that for all e £ E, we have Ae € [0, Amax], it is easy to extend the results of 
this section to the more general model defined by (note that A is now a vector in [0, Amax]"^): 
TrG,\{M) = ^£(^QX) where, Z{G, A) — J^meu^eeM^e- Results in this section and the next 
one holds provided A is replaced by A,nax- 

^ Monomer- Dimer systems 

We first recall a basic lemma which follows from Hoeffding's inequality (see [1]) and which 
will be used several times in the sequel: 

► Lemma 5. Let V be a set of n real numbers in [A,B], where A and B are constant. Let 
V' be a subset of V consisting of 0(1/ e^) elements chosen uniformly and independently 
at random. Let AVG be the average of all elements and AVG' be the average of sampled 
elements. Then with high constant probability, we have: AVG' — e < AVG < AVG' 4- e. 



3.1 Approximating the partition function 

The following formula which allows us to compute the partition function from the marginals 
is obtained easily from ([T|): 

Z(G,A)= n PgIxM, (7) 

l<A;<ri 

for any enumeration vi,V2, . ■ ■ ,Vn of the vertices of G. 

The following algorithm estimates logZ(G, A). We sample G(l/e^) numbers uniformly 
at random from {1, . . . ,n} and compute an e/2-approximation of the marginal distribution 
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Pot A(^fe) every sampled number k. Note that the graph underlying is Gk instead of G, 
so we set the parameter fco to k in the Algorithm Approx-Marginal, which means that we 
explore only vertices with identifiers larger than k. The constant G below is fixed in advance. 

Approx-Partition-Function(A, e) 

1 tmp 4— 

2 s ^ rC/e2] 

3 for i ^ 1 to s 

4 do fc ^ RANDOM(l,n) 

5 tmp tmp — log(AppROX-MARGiNAL(A, e/2, k, k)) 

6 return exp{tmp/s*n) 

► Theorem 6. Approx-Partition-Function(A, e) is an en-approximation algorithm for 
log Z{G,X) with query complexity O ^(l/e)*^^^^^ . 

Proof. Let A = - ^^^^.^^^ log(AppROX-MARGiNAL(A, e/2, fc, fc)). By Proposition [2] and 
Equation ([t]), A is an en/2-approximation of log Z{G,\). By Lemma [s] approximating the 
marginal propability at Q{l/e^) sampled nodes gives an en/2- approximation of A with high 
probability. This implies an en-approximation of log Z{G, A) with high constant probability. 
The query complexity of this algorithm is e(l/e2) • Q(e/2, A) = O (^(l/e)°(^)) . 

Note that the size of any maximal matching is always lower bounded by 2A^-i j where 
m is the number of edges. In particular, since Z(G, 1) is the total number of matchings, 
we have log 2 < logZ(G', 1) < mlog2 < ^log2 so that if rn = f2(n), we also have 

logZ(G, 1) = 0(n). Hence, if e and A are constants and m = f2(n), the error in the 
output of our algorithm is of the same order as the evaluated quantity. This is in contrast 
with the FPTAS (Fully Polynomial-Time Approximation Scheme) in yy or the FPRAS 
(Fully Polynomial-time Randomized Approximation Scheme) in |1H I19j which outputs an 
e-approximation instead of an en-approximation. Of course, we can let e tend to with 
n like c/n in Theorem |6] so that our result (when A is constant) is consistent with the 
FPTAS result of [1 . Indeed, in this case, clearly no sampling is required and if we replace 
the sampling step by a visit of each vertex, our algorithm is the same as in [T]. 

When we assume A to be fixed, the query complexity of the above algorithm is polynomial 
in 1/e. Next we prove a lower bound on the query complexity which is quadratic in 1/e. In 
the proof, we use a lower bound result from [5], which is based on Yao's minmax principle. 

For s G {0, 1}, let I?s denote the distribution induced by setting a binary random variable 
to 1 with probability ps = (1 -f- (— l)*e)/2 (and else). We define a distribution V on m-bit 
strings as follows: (1) pick s — 1 with probability 1/2; (2) draw a random string from {0, l}"* 
by choosing each bit bi from Vg independently. The following lemma is proved in |S]. 

► Lemma 7. Any probabilistic algorithm that can guess the value of s with a probability of 
error below 1/4 requires il{l/e'^) bit lookups on average. 

In order to get the lower bound query complexity of log Z(G, A), the idea is to create an 
n-node random graphs Gs depending on s G {0, 1} such that log Z{Go, A)— log Z(Gi, A) > pen 
for some constant p with high probability. So if there exists a (pen/3)-approximation algorithm 
for logZ(G, A) using o(l/e^) queries, then we can differentiate Go and Gi thus obtain the 
value of s with high probability using also o(l/e^) queries, which contradicts with the lower 
bound complexity in Lemma [Tj 
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► Theorem 8. Any probabilistic en-approximation algorithm for log Z(G, A) needs ft(l/e^) 
queries on average. It is assumed that e > C j^Jn for some large enough constant C. 

Proof. Consider the graph G consisting of n isolated vertices wi, • • • , w„. Pick s E {0, 1} 
uniformly at random and take a random [n/2j-bit string bi, ■ ■ ■ , ^Ln/aj with bits drawn from 
Vs independently. Next, add an edge between V2i-i and V2i if and only if 6^ = 1. Notice that 
the function logZ(G', A) is additive over disjoint components, so log Z{Gs, X) = J2i=i^^ 
where {a;i}i<i<n are independent random variables, and each Xi equals log(l + A) with 
probability (1 + (— l)''e)/2 and equals otherwise. For any two graphs Go and Gi derived 
from Vo and Vi respectively, we have E[log Z{Go, A)] - E[log Z{Gi,X)] = log(l + A) • e[n/2j . 
When €> C/^/n for some constant G large enough, we have |E[log Z{Go, A)] —log Z{Go, A) | < 
log(l + A)-en/10 and |E[log Z(Gi, A)] -log Z(Gi, A)| < log(l + A) -en/lO with high probability. 
Thus log Z{Go, A) - log ^(Gi, A) > log(l + A) • en/5 with high probability. Together with 
Lemma I?! we know that any probabilistic (log(l + A) • en/15)-approximation algorithm for 
logZ(G, A) needs fi(l/e^) queries on average, thus the statement of the theorem follows. M 

3.2 Approximating matching statistics 

We define the average size E{G, A) and the entropy S{G, A) of a matching sampled from the 
distribution ttq x by: 

E{G,X) = IMIttgMM) and S{G,X) = - ^ ^g,a(M) log 7rG,;,(M). 

MeM MeM 

The following algorithm estimates E{G, A), where G is a chosen constant. 

Approx-Matching-Statistics(A, e) 

1 tmp 

2 s 4- [G/e^] 

3 for i <— 1 to s 

4 do fc ^ Random(1, n) 

5 tmp 4- tmp + Approx-Marginal(A, e/2, k, 0) 

6 return n — tmp/ s * n/2 

► Theorem 9. Approx-Matching-Statistics(A, e) is an en- approximation algorithm for 
E{G, A) with query complexity O ((l/e)0(v^)y In addition, any en-approximation algorithm 
for E{G,X) needs / e^) queries. 

Proof. For 1 < fc < 71, let EST^ be the output of Approx-Marginal(A, e/2, fc, 0). Then 
logESTfc - e/2 < logpG,\{vk) < logEST^ + e/2 by Proposition [2] hence ESTfe - e/2 < 
PG,\{vk) < ESTfe + e/2. So ^ = Z]fc=i EST^ is an eri/2-approximation of J2vev PG,x{'i^)- 
By Lemma [sj taking ©(1/e^) sampled nodes gives an eri/2-approximation of A with high 
probability. This implies an en-approximation of X^uey Pg,a(^) with high probability. Since 
E{G,X) = n — X^ijgy Pg,a(i')/2, we thus get an en-approximation of E{G,X) with high 
probability. The query complexity of this algorithm is e(l/e2) • Q(e/2, A) = O (^(l/e)^^^)^ 
The lower bound of the query complexity is obtained similarly as in the proof of TheoremjS] 

To estimate the entropy S'(G, A), we use the following relation which is easy to prove. 



S{G, A) = log Z{G, A) log A • E{G, A). 



(8) 
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► Corollary 10. We have an en-approximation algorithm for S{G, A) with query complexity 
O |^(l/e)'^''^'^ . In addition, any en-approximation algorithm for S{G,X) needs f2(l/e^) 
queries. 

Proof. Let Z be the output of Approx-Partition-Function(A, e/2) and E be the output 
of ApPROX-MATCHiNG-STATiSTiCS(A,e/(21ogA)). By Theorem [e] Theorem [9] and Equa- 
tion([8|, log Z — log X ■ E is a.n. en-estimate of S{G, A) with high probability. Both Z and E 
can be computed using O queries. The lower bound of the query complexity is 

obtained similarly as in the proof of Theorem [8] M 

3.3 Some extensions 

So far, we did consider that the parameter A is fixed. Note that for a fixed graph G, if 
A — > cx) then the distribution ncx converges toward the uniform distribution on maximum 
matchings. Indeed, using a bound derived in [3], we can show that if A grows exponentially 
with i our technique allows to approximate the size of a maximum matching (see Section [a|. 
However our algorithm performs badly with respect to [24!. 

Letting A grows with ^ allows us to get new results for the permanent of a 0,1 matrix. 
When the matrix is the adjacency matrix of a constant degree expander graph, the best 
previous algorithm gives a FPRAS to estimate the permanent within a multiplicative factor 
(1 + e)" (see [5]). We improve this result by providing an constant-time algorithm within the 
same multiplicative factor (see Section [b]) . 

4 Other systems 

We now show how our technique extends to other systems. First, we would like to stress that 
|10l shows how the ferromagnetic Ising model (possibly with non-zero magnetic field) can be 
put in one to one correspondence with the monomer-dimer problem on a suitably chosen 
weighted graph obtained through local perturbation of the original graph. In particular, this 
allows to transfer directly the results obtained in previous sections to the ferromagnetic Ising 
model. We now consider two anti-ferromagnetic systems with respectively hard and soft 
constraints. 

4.1 Independent sets 

Let I be the set of independent sets of G. The partition function of the system is defined 
by Zi{G, A) = X^/ei -^'^'i the Gibbs distribution on the space I is defined by nc.xil) = 

z,^(G,\) ■ '^^'^^y V eV, define pg,\{v) ncxiv i I) = J2i^v '^g,a(^), where / ^ w is an 
independent set not containing v. 

Notice that l/Zi{G, A) is exactly the probability of the empty set, which is also equals to 
ni<fc<nPGfc,A(w/c), where Gk = G\{vi, - ■ ■ ,'i;fc_i}. Hence we have: 

ZHG,A)= n PgIxM. 

l<k<n 

However it is well-known that the correlation decay implying a result similar to Lemma [T] 
does not hold for all values of A. Indeed Weitz in [22] gave a FPTAS for estimating Z{G, A) 
up to the critical activity for the uniqueness of the Gibbs measure on the infinite A-regular 
tree and we can adapt his approach here. 
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The following lemma is a direct application of Theorem 2.4 and Proposition 2.5 in 
(and gives us the analogue of Lemma [T| in the case of matchings) . 

► Lemma 11. For any A and any < A < Ac(A) ~ (a^)^+i ' with activity A exhibits 
strong spatial mixing with rate 5{l) = 0(e~"') for some positive a. 

A similar approach as in Section [3] leads to the following result. 

► Proposition 12. Let < A < Ac(A — 1). We have an en- approximation algorithm for 
logZ/(G, A) with query complexity polynomial in ^ for any graph G with maximal degree A. 
In addition, any en-approximation algorithm for log Zj{G, X) needs f2(l/e^) queries. 

Proof. Take a sampling of 0{l/e^) vertices. For every sampled vertex, estimate logpG^.\{vk) 
with an additive error e' — 2iog(\+A) exploring a constant-size path-tree rooted at this 
vertex (also called self-avoiding- walk in [22] ) • This is equivalent to find two bounds pi and 
P2 such that pi < PGk.\{vk) < P2 and logp2 - logpi < e' . 

Let e" — 2{i+x) ^'^d truncate the path-tree at depth h = (log jft)/^ -\- 0(1). Set all leaves 
to and to 1 to get two results pi and p2 (assume pi < P2)- Then pi < PGfc,A(wfc) < P2- By 
the definition of strong spatial mixing, P2 — Pi < S{1) — e", so logp2 — logpi < log(l -\- ^) < 
log(l + (1 + A)e") < e', where the second inequality holds since pi > j^^^. The number of 
queries used by the algorithm is 0(A'/e^), where I = O(log^). The lower bound of the 
query complexity is obtained similarly as in the proof of Theorem |8] ■< 

In particular, if A < 5, we have Ac(A — 1) > 1 so that we can approximate Zj(G, 1) 
which is the number of independent sets of G. 

4.2 Ising Model 

In the Ising Model, each vertex in the graph G = (V, E) is in one of the two states, referred 
to as " -I- " and " — ". Such a system can be defined by specifying an edge activity /3 and a 
vertex activity A. When /3 < 1, the Ising model is called anti-ferromagnetic. A configuration 
a : V ^ {+, — } is an assignment of " + " and " — " to the vertices of G. The weight w{a) 
of the configuration a is given by w{a) — A™*^°')/3"'^'^\ where m{a) denotes the number of 
vertices assigned state " — " and n{(7) denotes the number of edges for which both endpoints 
are assigned to the same state. The partition function of the model is defined as 

Using a similar proof as in Proposition [12] we have: 

► Proposition 13. Let A > 3. Consider an anti-ferromagnetic Ising model with parameters 
j3 and A, where j3 and A are in the interior of the uniqueness region of the (A — \)-ary tree. 
There is an en-approximation algorithm for log Zs{G, A, j3) with query complexity polynomial 
in ^ for any graph G with maximal degree A. 
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|A Size of maximum matching 



As A — > oo, E{G, A) tends to the size of a maximum matching, let it be MM. 

► Lemma 14. For any e > 0, take A = e^^?^ , then we have E{G, A) < MM < E{G, A) + en. 

Proof. Since m < ^ in a degree bounded graph, this lemma follows directly from Lemma 
12 in 0, which proves that E{G, A) < MM < E{G, A) + 

(A log 2 \ 
e i , I 1 is an en-approximation al- 



gorithm for MM with query complexity O (^{\/eY 



► Remark. The query complexity of our algorithm is double exponential in A/e, which is 
outperformed by [24j . However our algorithm is simpler and can be carried out using parallel 
computing. 

|B Permanent of expander graphs 

Consider an n by n bi-partite graph G with the node set V = X , where \X\ = \Y\ = n. 
For every S C V, denote N{S) to be the set of nodes adjacent to at least one node in S. For 
a > 0, a graph is an a-expander if for every subset S C X and every subset 5' C y, as soon 
as \S\ < n/2, the inequality |Af(5')| > (1 + Oi)\A\ holds. Let A = (a^j) be the corresponding 
adjacency matrix of G, i.e., the rows and columns of A are indexed by nodes of X and Y 
respectively, and = 1 iff {xi, yj) is an edge in G. Let PERM denote the permanent of A. 
We already know that computing the permanent of a matrix is #P-complete, even when the 
entries are limited to and 1, so we look for an estimate of PERM. The following lemma 
has been proved in [5]. 

► Lemma 16. Let G be an n by n bi-partite a-expander graph which is degree bounded by 
A. Then for every A > 0, we have: 

^ < Z{G, A) 
- A"PERM - 

We will estimate PERM with a multiplicative factor e*^" in constant time using the 
Approx-Partition-Function algorithm. This improves the FPTAS algorithm in [H] with 
the same approximation performance. 

► Proposition 17. Let G be an n by n bi-partite a-expander graph and let e > 0, where 
a and A are constant. There is an en-approximation algorithm for logFlSKM with query 
complexity 6 ((l/e)0(V^7(^^ 



Proof. Take A = 9 (log A/(ea)) so that ©(A^^ log"^(l + a) log A) < e/2 holds. Algorithm 
Approx-Partition-Function(A, e/2) uses queries and provides with 

high probability an (en/2)-approximation of Z(G, A), let it be Z. From Lemma 16 Z/A" is 
an en-approximation of log PERM with high probability. 

[C Tests on large graphs 

In this section, we show the performance of our algorithm on the average size of a match- 
ing E{G, 1) on large real-world graphs from Stanford large network dataset collection 
(http://snap.stanford.edu/data/index.html). Our algorithm performs well on both small 
degree graphs and small average-degree graphs. The tests are based on: 
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H microprocessor: intel core i5 750 (2.67 GHz, 256KB L2/core, 8MB L3) 

H memory: RAM 4 Go 

H compiler: g++ version 4.4.3, option -02 

H operating system: linux Ubuntu 10.04 

C.l Small degree graphs 

Consider the three road network graphs from Stanford large network dataset collection where 
A is small. Intersections and endpoints are represented by nodes and the roads connecting 
these intersections or road endpoints are represented by undirected edges. 
H roadNet-CA : California road network with n = 1965206, A = 12 
H roadNet-PA : Pennsylvania road network with n ~ 1088092, A = 9 
■ roadNet-TX : Texas road network with n = 1379917, A = 12 

We test our algorithm for decreasing value of e. For a given e, our program outputs 
an estimate of E{G, 1) within an error of en with probability at least 2/3. The following 
diagram gives the executing time of our program with respect to 1/e. The three curves in 
Figure [l] correspond to the three graphs above. 
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M Figure 1 Performance in degree bounded graphs 

C.2 Small average-degree graphs 

We already see that our algorithm works well on graphs with small A, and now we extend this 
algorithm onto graphs with small average degree, since these graphs are of practical interest 
in the real world. Notice that in a graph with small average degree, the number of large 
degree nodes is limited. The idea is to skip such nodes by returning rough bounds instead of 
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visiting its descendants in the path-tree during the DFS. Set Max-Degree to be the maximum 
degree of a node that the DFS is going to visit. When Max-Degree is small, the estimate is 
not very accurate (i.e. the upper bound and the lower bound might be far away). When 
Max-Degree equals to A, the output is exactly the desired e-estimate. However the query 
complexity in the latter case is exponential in A, where A might be the same order in n even 
though the graph has small average degree. So we need to set Max-Time as the maximum 
quantity of time to spend at every sampled node v: we calculate x]j{v), x'^{v), ■ ■ ■ , (w), • • • , 
and stop if the accumulated time of these calculations exceeds Max- Time (assuming this 
happens when calculating x'^{v)). By Equation (|4]), x^~^{v) and x^~'^{v) are two bounds 
for Xy(y). From the bounds at every of the 0(l/e^) sampled nodes, we get the bounds for 
E{G, 1). The time and query complexity is 0(l/e^), where the coefficient depends on the 
parameter Max-Time. 

Consider two following graphs from Stanford large network dataset collection. 
H Br ightkite- edges with n = 58228, average degree=3.7, A = 1134: it was once a location- 
based social networking service provider where users shared their locations by checking-in; 
the friendship network was collected using their public API. 
H CA-CondMat with n = 23133, average degree=8.1, A — 280: it is a collaboration network 
of Arxiv Condensed Matter category; there is an edge if authors coauthored at least one 
paper. 

The performance of our algorithm on these two graphs is given in Figure [2] We test for 
the cases when Max-Time=0.1 and 1.0 respectively. For a given Max-Time, we increase 
Max-Degree and get an upper bound and a lower bound of E(G, 1) as output, where Error 
indicates the ratio of the difference between the two bounds and n (in percentage) . 
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H Figure 2 Performance in small average-degree graphs 

We see that our algorithm outputs an estimate of E{G, 1) within a small percentage 
of error for graphs with small average degree, even when A is large. The time and query 
complexity depends only on e and Max-Time. 



